AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multi-task Visual Understanding

# Multi-task Visual Understanding

PE Spatial G14 448
Apache-2.0
The Perception Encoder (PE) is a state-of-the-art image and video understanding encoder trained through simple vision-language learning.
P
facebook
3,256
16
Florence 2 Base
MIT
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.
Text-to-Image Transformers
F
microsoft
316.74k
264
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase